An Unsupervised Approach to Product Attribute Extraction
نویسندگان
چکیده
Product Attribute Extraction is the task of automatically discovering attributes of products from text descriptions. In this paper, we propose a new approach which is both unsupervised and domain independent to extract the attributes. With our approach, we are able to achieve 92% precision and 62% recall in our experiments. Our experiments with varying dataset sizes show the robustness of our algorithm. We also show that even a minimum of 5 descriptions provide enough information to identify attributes.
منابع مشابه
DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web
The web is a rich resource of structured data. There has been an increasing interest in using web structured data for many applications such as data integration, web search and question answering. In this paper, we present DEXTER, a system to find product sites on the web, and detect and extract product specifications from them. Since product specifications exist in multiple product sites, our ...
متن کاملAn Unsupervised Approach for Product Record Normalization across Different Web Sites
An unsupervised probabilistic learning framework for normalizing product records across different retailer Web sites is presented. Our framework decomposes the problem into two tasks to achieve the goal. The first task aims at extracting attribute values of products from different sites and normalizing them into appropriate reference attributes. This task is challenging because the set of refer...
متن کاملEntity Attribute Extraction from Unstructured Text with Deep Belief Network
Entity attribute extraction is an extremely challenging research area with broad application prospects. In this paper, we propose a new approach to extract the entities’ attributes from unstructured text corpus that was gathered from Web. The proposed method is an unsupervised machine learning method that extract the entity attributes utilizing DBN. To test the proposed method, we use it to ext...
متن کاملGeneralizing Syntactic Structures for Product Attribute Candidate Extraction
Noun phrases (NP) in a product review are always considered as the product attribute candidates in previous work. However, this method limits the recall of the product attribute extraction. We therefore propose a novel approach by generalizing syntactic structures of the product attributes with two strategies: intuitive heuristics and syntactic structure similarity. Experiments show that the pr...
متن کاملOPINE: Extracting Product Features and Opinions from Reviews
Consumers have to often wade through a large number of on-line reviews in order to make an informed product choice. We introduce OPINE, an unsupervised, high-precision information extraction system which mines product reviews in order to build a model of product features and their evaluation by reviewers.
متن کامل